Skip to content

Conversation

@dianyo
Copy link
Contributor

@dianyo dianyo commented Sep 1, 2024

What does this PR do?

Dealing with #8728

In the original issue, I observed that using BrownianInterval could provide a slight speedup in the diffusion loop. However, the performance gain was primarily due to not setting halfway_tree=True in the arguments, which caused the results to differ from the original implementation.

To maintain deterministic results before and after using BrownianInterval, it’s not feasible to accelerate the implementation in this way. Nonetheless, I still believe it’s worthwhile to migrate from BrownianTree to BrownianInterval since BrownianTree is noted as a legacy class in the torchsde documentation.

Test Settings

I test the numeric and image output using following script on an A100 GPU device with OS 20.04.1-Ubuntu

import torch
from diffusers.pipelines.stable_diffusion_xl import StableDiffusionXLPipeline
from diffusers.schedulers.scheduling_dpmsolver_sde import DPMSolverSDEScheduler
from PIL import Image

pipe = StableDiffusionXLPipeline.from_pretrained("stabilityai/stable-diffusion-xl-base-1.0")
scheduler = DPMSolverSDEScheduler()
pipe.scheduler = scheduler
pipe = pipe.to("cuda")

generator = torch.manual_seed(1996)
prompt = "A cat"
out = pipe(prompt, generator=generator)  # Run the model for 10 timesteps
out.images[0].save("output_tree.png")

Test Output

Using BrownianTree
output_tree

Using BrownianInterval
output_interval

SHA result

sha256sum output_tree.png output_interval.png
55215252f7488ee788ce2932a9f305aa9880cf6e42ed9251e0f81eeaf1ea3879  output_tree.png
55215252f7488ee788ce2932a9f305aa9880cf6e42ed9251e0f81eeaf1ea3879  output_interval.png

Before submitting

Who can review?

@yiyixuxu

@yiyixuxu
Copy link
Collaborator

yiyixuxu commented Sep 4, 2024

thank you!

@HuggingFaceDocBuilderDev

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

@yiyixuxu
Copy link
Collaborator

yiyixuxu commented Sep 4, 2024

thanks for the PR! It looks good to me
would you be able to test out an example for stable audio too? (they also use browniantree!) https://huggingface.co/docs/diffusers/api/pipelines/stable_audio

@dianyo
Copy link
Contributor Author

dianyo commented Sep 10, 2024

Hi @yiyixuxu,
I've run the test using the following script and the results are the same!

import torch
from diffusers import StableAudioPipeline
from PIL import Image
import soundfile as sf

repo_id = "stabilityai/stable-audio-open-1.0"
pipe = StableAudioPipeline.from_pretrained(repo_id, torch_dtype=torch.float16)
pipe = pipe.to("cuda")

# define the prompts
prompt = "The sound of a hammer hitting a wooden surface."
negative_prompt = "Low quality."

# set the seed for generator
generator = torch.Generator("cuda").manual_seed(0)

audio = pipe(
    prompt,
    negative_prompt=negative_prompt,
    num_inference_steps=200,
    audio_end_in_s=10.0,
    num_waveforms_per_prompt=3,
    generator=generator,
).audios

output = audio[0].T.float().cpu().numpy()
sf.write("output_tree.wav", output, pipe.vae.sampling_rate)

Here's the sha256 results

# sha256sum output_interval.wav 
21e8033448080a52c0519b8df3c17ff1ce70eb5a235b0fb41293a6803e05ae2a  output_interval.wav
# sha256sum output_tree.wav 
21e8033448080a52c0519b8df3c17ff1ce70eb5a235b0fb41293a6803e05ae2a  output_tree.wav

Copy link
Collaborator

@yiyixuxu yiyixuxu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thank you!

@yiyixuxu yiyixuxu merged commit b19827f into huggingface:main Sep 11, 2024
15 checks passed
sayakpaul pushed a commit that referenced this pull request Dec 23, 2024
migrate the BrownianTree to BrownianInterval

Co-authored-by: YiYi Xu <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants